Goto

Collaborating Authors

 human-readable text and garbled text


Evaluating Methods for Distinguishing Between Human-Readable Text and Garbled Text

Henderson, Jette L. (The University of Texas at Austin) | Frazee, Daniel J. (The University of Texas at Austin) | Siegel, Nick P. (The University of Texas at Austin) | Martin, Cheryl E. (The University of Texas at Austin) | Liu, Alexander Y. (The University of Texas at Austin)

AAAI Conferences

In some cybersecurity applications, it is useful to differenti- ate between human-readable text and garbled text (e.g., en- coded or encrypted text). Automated methods are necessary for performing this task on large volumes of data. Which method is best is an open question that depends on the spe- cific problem context. In this paper, we explore this open question via empirical tests of many automated categoriza- tion methods for differentiating human-readable versus gar- bled text under a variety of conditions (e.g., different class priors, different problem contexts, concept drift, etc.). The results indicate that the best approaches tend to be either variants of naïve Bayes or classifiers that use low- dimensional, structural features. The results also indicate that concept drift is one of the most problematic issues when classifying garbled text.